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ABSTRACT 

The use of classroom observation is explored in 
several capacities. Specific observation instruments that were 
developed to evaluate the effectiveness of the National Follow 
Through Program were later used (sometimes in adapted forms) to study 
early childhood programs, secondary school programs, student teacher 
effectiveness; and use of time across a school district. Project 
Follow Through was intended to provide a program analogous to Head 
Start for economically disadvantaged children over a longer period of 
time. This chapter presents the fourth and most comprehensive report 
of Follow Through claspioom observation data collected in spring 1973 
from 36 sites representing 7 sponsors and 7 program models (35 first 
grades and 36 third grades). Other studies using the developed 
Classroom Observation Instrument (COl) in the follov;ing are briefly 
outlined: (1) early childhood education; (2) English a;? a Second 
Language; (3) secondary school; (4) staff development; (5) effective 
use of time training; (6) student evaluation; and (7) student 
teaching. The observation techniques have provided a means to 
identify effective instructional practices in a wide range of 
classroom settings. Two tables and one figure illustrate the 
discussion. (SLD) 



Vc 3'c yc j'c >V "k it i: -k jV V? t't -k k * Vc Vc Vc Vc Vc i< >V jV ?'c ic * Vc Vc j'c i: jV jV k V? jV Vc Vc k it i: Vc Vc Vc * * jV iV * * * * * ic Vc * iV Vc Vc ic ic k if it -k Vc 

Reproductions supplied by EDRS are the best that can be made ''^ 
* from the original document. ''^ 

iV y? jV *yc * iV y? *yc :lr V:* Vc * *yc Vc Vc y? * **iV * ^VVir * * * ititiiiz-kidi it k it itkkitkifkitit it k it in kit kit it itkkkk it 



Temple University Center for Research 
in Human Development and Education 



91-3 



Publication Series 



Observation for the Improvement of Teaching 



U , OeMI.Tl.lKT Of iOOCil^^O^ 



Jane S tailings and H. Jerome Freiberg 



•PERMISSION TO REPRODUCE THIS 
MATERIAL HAS BEEN GRANTED BY 

TO THE EDUCATIONAL RESOURCES 
INFORMATION CENTER (ERiCV 




Observation for the Improvement of Teaching 



Jane Stallings 
Professor, College of Education 
Texas A&M University 



H. Jerome Freiberg 

Professor of Curriculum and Instruction 
University of Houston 
Research Associate 
Temple University Center for Research 
in Human Development and Education 



Abstract 



This chapter explores the use of classroom observations in several capacities. Specific observation 
instruments that were developed to evaluate the effectiveness of the National Follow Through progiam were later 
used (sometimes in adapted forms) to study early childhood programs, secondary school programs, student teacher 
effectiveness, and use of time across a school district. A study was also conducted to evaluate the effects of using 
observation data as a basis for staff development. 



From Hersholt C. Waxman and Herbert J. Walbcrg: Effective Teaching: Current Research, Copyright 
1991 by McCutchan Publishing Corporation, Berkeley CA 94702. Permission granted by the publisher. 
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Jane Stallings and H. Jerome Freiberg 



INTRODUCTION 



In 1969 Stanford Research institute (SRI) held a contract with 
the National Institute of Education to evaluate National Head Start 
and Follow Through Planned Variation programs. Project follow 
Through was established by the U. S. Congress in 1967 (the legislatwe 
authority was the Economic Opportunity Act of 1964 as amended) 
when it became apparent that a program was needed in the early 
grades of public school that was articulated with Project Head Start 
als and approaches and, therefore, would provide a comparable 
educational program for economically disadvantaged children over a 
longer period of time. A clearly stated purpose of the Follow Through 
program was the enhancement of the life chances of the economically 

"^'^Accordtg'to Deutsch (1967), "children from backgrounds of 
social marginality enter the first grade already behind their middle^ 
class counterparts in a number of skills highly related to scholastic 
achievement. They are simply less prepared to meet the demands of 
the school and the classroom situation. ... In other words intellec- 
tual and achievement differences between lower-class and middle- 
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class children arc smallest at the first-grade level, and tend to increa^ie 
through the elementar>' school years" (pp. 64-65). However, an 
evaluation by Wolff and Stein (1966) of the first summer program of 
Head Start in 1965 had indicated that the initial achievement gains of 
the children had not been maintained in the public school. These 
early findings were believed to indicate that a more sustained pro- 
gram of longer duration might produce lasting gains. The result was 
the establishment of Follow Through Planned Variation as a longi- 
tudinal quasi-experimental program that would evaluate the ability of 
an intervention program to enhance the educational achievement of 
economically disadvantaged children. 

Project Follow Throujh was originally set up in a "planned 
variation" research design; that is, the goal was to examine the 
differential effectiveness of programs based on divergent educational 
and developmental theories. The program began when researchers 
and other educational stakeholders were invited by the government to 
submit plans for establishing their various programs in public schools 
in order to test whether their individual approaches could improve the 
educational achievement of economically disadvantaged children. 
From the group that came forward, twenty-two were selected to 
implement their programs as Follow Through sponsors. We refer to 
"sponsors" here as those responsible for constructing and imple- 
menting the educational programs (or models). Eleven of the twenty- 
two sponsors of educational programs had developed and tried their 
models in university settings, eight were affiliated with private re- 
search institutes, and three were commur.ity-developed programs. 

The sf)onsors of educational programs described their models to 
an audience that included representatives from school districts around 
the country at a conference in Kansas City, Kansas, in 1968. Ulti- 
mately, these models were implemented in 154 Follow Throu£;h 
projects within 136 urban and rural communities throughout the 
nation. The Follow Through sponsors then faced the challenge <>i 
program implementation, mcluding guiding the behavior of teachers 
if)ward spocifiod g^als sci by the spons(*rs. Egbert (I07!i) provides a 
historical view of the Follow I'hrougli project. 

In other evaluations ol Kcillow through Plannrd Variation, ihc 
major rmphasis was lo ilcicimmc if the uhkIcIs ailrcuM chiUlrcirs 
perrorniaiuc. \r\ it was cli'.ir that if siu h cflrcls were loiind, and il llic 
f'Hrris wrrc dillrn-ni Iroin oih rno(l<-l lo anoflu*r, we would n-»! know 
what rauscd the* (hlh iJ iK cs I hcn^loic, we iu'e(l<*d to know wl ai wa> 
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actually happening in the classrooms. In order to determine whethtr 
the sponsors were effective in getting teachers to practice their methods 
in the classroom, it was necessary to observe the classrooms system- 
atically. We wanted to know whether a child^s day in the classroom 
corresponded with the sponsor's educational prescriptions. To assess 
this, we needed a comprehensive observation instrument. 

In the fall of 1%9, the SRI staff, with assistance from twelve 
sponsors' representatives, developed an observation system with 
which a wide range of classroom behaviors could be recorded and 
with which objective information could probably be recorded that 
would provide a fair assessment of all sponsors' models. The proce- 
dures that were developed could record activities, materials used, 
groupings, and interactions. This chapter presents the fourth and 
most comprehensive report of Follow Through classroom observation 
data. The data for this study were collected in spring 1973 from 
thirty-six sites representing seven sponsors. The seven models of the 
chosen sponsors represent a wide spectrum of innovative educational 
theories and were selected because each model was being imple- 
mented in at least five locations. The models selected for this study 
include two models based on positive reinforcement theory (from the 
University of Kansas and from the University of Oregon), a model 
based primarily on the cognitive developmental theory of Jean Piaget 
(High/Scope Foundation), an open-classroom model based on the 
English Infant School Theory (Education Development Center), and 
three other models drawn from Piaget, John Dewey, and the English 
Infant Schools (Far West Laboratory, the University of Arizona, and 
Bank Street College). 

The study focuses on whether sponsors can deliver their educa- 
tional models to diverse communities, and explores the effects of 
training classroom personnel to use specific procedures in the class- 
room. An educational theory can be proved effective only if the 
teachers and aides carry out program specifications. Such specifica- 
tions set by sponsors include the physical arrangement of the class- 
room, utilization of the prescribed curricula, and interactions with 
children. This study addresses the following issues: 

1. Are the observed classroom programs consistent with their 
sponsors' stated intentions? That is, does the model show a 
relatively high frequency of occurrence of those elements of the 
program that the sponsor rated as important? 
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2 Are the sponsored classrooms consistent within a site and 
between sites? That is, do the four third-grade classrooms at 
site A score similarly on specific program components and do 
thev resemble the third-grade classrooms at sites B, C, D, and E. 

3 How do selected classroom processes relate to scores on the 
following: Metropolitan Achievement Test (MAT) (reading and 
arithmetic^. Raven's Coloured Progressive Matnccs, Cooper- 
smith's Self-Estecm Inventon-. and Intellectual Achievement 
Responsibility i^le (lAR)? 



SAMPLE 

Four first-grade and four third-grade classrooms were obserx ed in 
each of thirtv-six cities and towns. These represented five projects for 
each of six of the Follow Through educational models and six projects 
for the University of Arizona's model. One first-grade and one 
third-erade non-FoUov^ Through classroom were selected for compan- 
son at each project site. These non-FoUow Through classrooms were 
combined to form two pooled comparison groups, thirty-five first 
grades and thirty-six thi.d grades. The projects included m the 
sample represented all geographic regions, urban and rural areas, and 
several racial and ethnic groups. 

Observation sites were selected according to the following criteria. 
(1) they were among the sites where pupil testing was to occur m 
spring 1973 as part of the Follow Through evaluation; (2) each sponsor 
would, as much as possible, have a balanced geographic distribution 
ofsites. which included urban-rural and north-souih projects: and [i] 
each sponsor would have included at least two sites where he thought 
the moocl was well implemented. 

Tn addition to identifying classrooms (or observation, ran- 
domly selected four children from each classroom for individual 
obserNutions. At each site, the primary consideration m «dcn"fying 
the classr<K,ms and children to bo observed was the availability <.{ 
baseline data for the children when tlicy entered s< hool in kindergar- 
ten or lirsl grade. ' . 

In those pp.- rets where baseline data were not available.^ t .. 
FolUm- Thnm^h elassrrK.ms were nominated bv the s,)onsor u.u_ the 
n„n-F..llow Throueh elassrcM.nis were scleeted by the SRI feld 



ERIC 



7 



Observation for the Improvement of Teaching 

OpcTalions sulF. I hr SRI stafl" sclcctal children lor individual n\ 
vation on a random basib IVom classroom rosier lists. 
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MEASUREMENT INSTRUMENTS 

Behavior Observation 

The Classroom Observation Instrument (COI) is designed to 
record classroom arrangements and elements of events considered 
educationally significant by the Follow Through sponsors. 

Formatton of Variables. Many of the individual codes are too 
molecular to serN e effectively as measures of classroom educational 
characteristics. Hence, it was necessary to form theoretically signifi- 
cant variables by combining certain codes. The COI consists of 602 
categories describing behaviors of teachers and children in he class- 
room situation. The items identify materials used in the classroom, 
the grouping arrangements of teacher and children, the activities that 
occur, the behavior of an individual, and the interactions that occur 
between two or more people. 

Interaction observations were made in five-mmute sequences. .\ 
form of shorthand was used to record the continuous action and 
interaction of selected persons in the classrooms. On two of three days 
of observation, there was an adult focus, that is, the classroom adults 
were the subjects of observation. On the remammg day, the four 
randomlv selected children were the focus. Hence, the data prov.de 
one set of measures of classroom process (adult focus) and one set oi 
child behaviors (child focus) with the same set of categories or codes. 

Observers were instructed to complete approximately four obser- 
vations each hour during the five-hour observation day; hence, u was 
hoped that a total of twenty observations would be completed each 
dav, or sixtv for each classroom. For the 1,01 1 observation days of all 
obscrv'crs the adjusted mean number of observations completed each 
dav was 18.88, with a standard deviation of 2.17. Fewer obsen-ations 
occurred for certain classrooms because of intervening events during 
the class day that prohibited observations. Data from any day that 
had fewer than twelve observ-ations were deleted. . ,q^o 

The data were collected on three consecutive days in spring 19 /i. 
In most cases, the teachers had been working with the sponsors 
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educational models for two or three years. No beginning teachers 
were selected for observation. 

Since there are over 100,000 possible combinations of codes that 
could form variables, it was i mportant to formulate only those vari- 
ables considered relevant to the study of sponsor implementation. The 
following sections will describe the transformation of codes from each 
portion of the classroom observation instrument into variables. 

Because this study evaluated classroom environments and class- 
room instructional processes, the classroom rather than the child was 
the unit of analysis. Classroom mean scores were also computed for 
the sample of individually observed children. Each classroom was 
assigned a value on a given variable based on the sum of the frequency 
of occurrence of the variable for the obser\-ation days. 

Classroom Summary Information fCSIj Variables. Once a day, before 
the observation with the COI started, the observer recorded informa- 
tion that identified the classroom by sponsor, site, teacher, grade, and 
observer. The obser\-er also noted the numbers of adults and children 
present. To obtain the ratio of children to teachers and aides, the total 
number of children present on each obser\'ation day was divided by 
the total number of teachers and aides present. An average ratio over 
the three days was then computed. Total class duration was com- 
puted bv averaging the number of class hours recorded for the three 
days of obser\*ations. 

Physical Environment Information (PEI) Variables, This section of the 
COL completed once each obser\-ation day, provided two kinds of 
information: (1) seating and workgroup patterns, and (2) equipment 
and materials present and used in the classroom. The scores for a 
classroom were based on the sum of all three days. 

Classroom Checklist (CCL) Variables. The CCL variables define the 
frequency of occurrence of specific activities {c.!^^, c^roup time, math- 
ematics, dramatic pla\ ) that denote the frequency of occurrence of the 
dilTcrcni i,m)upin(TS of adults and childrm (c..^., aide with small (rroup 
of children, one child witln/Ul an adult), c;roupini; within particular 
aciiviiics (c.t^., leachci wiili two children in malhrmalics activity), 
.ukI use ol sprcial maKTjals or cquipnirMl Ir.i;., if>cts or woik- 
hooks, aiuliM\ isti;il r(juipmrnn within ihc activities of malhrmalics, 
rcadnuv s. ndirs. and s( irncc. Somi- nH Javarial)lrs were lormrd. 
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sudi as 'Mliiw lirqiu nlly dues a ( lulci n cTiv<- individual atlrtiliun 
from an adnll?" This nu-lavarial)I<^ was (ormod l)y addini^ many 
discr^'lc sul)varial)lrs, such as **H()W frcqurnlly dors a ihild mrivc 
individual alicniion from an aide during maihenr;aiics?" plus ^^How 
frcqucnll> docs a child receive '.ndividual ailenlion from a volunK-rr 
during reading?" plus all olh-r variables thai describe an incident 
where one child is working with an adult. 

Five-minute Observation (FMO) Variables. This main portion of the 
COl is used to record, in the form of coded sentences, interactions 
that occur in the classroom. The Flanders Interaction Analysis Ob- 
ser\'ation system served as the model for this section of the COL The 
Who, to Whom, What, and How codes have functions and opera- 
tional definitions similar to the Flanders system (1970). For this 
purpose, the observer used a scries of four-cellcd frames (sec Figure 
5-1 for frames used in preschool or kindergarten classrooms). To 
record each interaction, the observer made a check mark in the 
appropriate circle in each of the four cells of a frame. These marks 
identified the speaker, the person being spoken to, and the message 
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Figure 5-1 

Frames Used for ObMivations In Preschool and Kindctfarten 

Ctaaarooma 



Who and To Whom categories: T-icachcr, A-aidc, V-voluntccr, C-child. I>-dir- 
fcrent child, 2-two children, S-small group, L-largc group, An-animal, M-ma- 
chinc. What categories: 1-command or request; IQ-direct question 'Q from How 
col.); 2-opcn-endcd question; 3-responsc; 4-insiruction, explanation; 5-gencral 
comments/general action; 6-task-reiatcd comment; 7-acknowlcdgc; 8-praise; 
g-corrective feedback; 10-no response; 11-waiting; 12^bscrx'ing, listening: 
NV-nonverbal; X-movcmcnt. /f«r categories: H-happy; U-unhap|5y: N-ncga- 
tive; T-touch, Q-quc»tion; G-guidc/reason; P-punish; O-objcct; W-worth; 
DP-^ramatic play/pretend; A-academic; B-bchavior. 

• R»Repeat the F^^mc, S-Using Second Language, C-Canccl the Frame 
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being delivered. The How column describes the emotional affect and 
whether the conversation r.ad an academic content or referred to 
behavior. Each frame represents a sentence. If one person asks a 
question, it is coded in the first frame. A second frame is used for the 
response of the other person, and so on. 

For example, the following three interactions would require three 
frames: 

1. TEACHER: Maria, what did you like best abcui the stor\ 
PftfT Pan? 

[In our shorthand, this sentence is coded TC2QA, The teacher 
(T, Who column) has asked Maria (C, To Whom columnj a 
thought-provoking question (2Q, What column). The question 
is about the academic subject in the How column (A, How 
column).] 

2. MARIA: Tinkerbell. She was very brave. 

[Shorthand: CT3A. The child (C, Who column) responds ^3. 
What column) with academic content (A, How column) to the 
teacher's (T, To Whom column) question.] 

3. TEACHER: Oh yes, she was brave, wasn't she? 
[Shorthand: TC7A. The teacher (T, Who column) acknowl- 
edges (7, What column) the child's (C, To Whom column) 
academic (A, How column) response.] 

Seventy-two of these frames represent a five-minute interaction 
period. The variables formed from these complex codings were those 
that seemed most appropriate the sponsors' models and to the 
analysis planned for this study. 

The FMO variables were selected and named to describe interac- 
tions relevant to sponsors' implementation. The variableii are defined 
by appropriate code combinations or sentences. Generally, the FMO 
variables describe child-aduh verbal interactions (i.e., questions, 
responses, instruction, comments, and feedback) and nonverbal inter- 
actions (i.e., ncmvcrbal requests, responses, self-instruction, feedback, 
wailing, and observing/listening). In some cases, these FMO vari- 
ables arc further defined by th«' How category modifiers (such as 
academic, serial behavior, happv, negative). A few variables are 
defined by the siuiueiuial order ng of ccriaiii interac tion frames (e u-, 
adult question lullowcd by child resp(mse followed by adult feedback). 
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Observer Reliabiliti; 

All observers were trained in a seven-day intensive trainini; course 
delivered ai four national training sites. Potential observers watched a 
specimen videotape, and only those who met the final criterion of 
coding interactions with a reliability of 70 percent or above were 
employed to collect data for this study. Of the original seventy-two 
who were trained, sixty-three met this criterion. Nine more obser\ ers 
were trained in a special session to fill these vacancies. 

The observers began work in classrooms; after approximately two 
weeks, twenty simulated classroom situations were videotaped and 
shown to the observers to code. Each simulation was approxi- 
mately twenty interaction frames long. These simulations attempted 
to present several concise, clear examples of each code used in the 
COL Each simulation began with a still frame in which the narrator 
identified the focus of the obscr\'ation (a teacher or a child). Each skit 
was shown with a two-second pause between interactions. The ob- 
servers were instructed to code one interaction frame during each 
pause. 

Matrices were constructed for each observer. A form was pre- 
pared that listed all sixteen What codes across the top of the matrix 
and down the side. Those acrors the top were the "true" codes as 
judged by the investigators; the numbers of instances of each occur- 
ring in the twenty situations were listed in the row under the labels. 
The codes listed down the side were the actual codes ascribed by the 
observer being tested. The reliability booklets for each observer were 
examined frame by frame, and tallies were made of each observer's 
coded interaction sequences. If the obser\'er's coding agreed with the 
criterion, a tally was placed in the intersection of the row and column. 
The principal diagonal, then, contains the cells indicating the ob- 
server's correct coding; other cells contain incorrect coding. The row 
totals are the total number of limes an observer recorded each code, 
whether correctly or not. The number of criterion examples shown 
across the top could be compared with the diagonal to compute the 
observer's reliability for each code. An examination of a particular 
cell in the row reveals whether the code was recorded correctly or 
incorrectly, and, if recorded incorrectly, the row cells show exactly 
which codes were confused with one another. (This is reported in 
great detail by Sullings and Gieson, 1977.) 
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Thus, observer bias can be assessed by examining the overuse, 
underuse, or confusion of codes. In this study, each observer was 
responsible for observing one grade level at a single location and 
therefore ihe data collected by each observer is identifiable in 
analysis. The value of this method for measuring accuracy- is that it 
contributes directly toward interpreting the data. 

Other Child Measures 

The children's ability when they began school was assessed by the 
Wide Range Achievement Test (WRAT). It was administered to 
children when they entered school, either at the kindergarten or 
first-grade level. 

Reading and mathematics skills were assessed by the MAT in 
both first and third grades. Problem-solving skills (perceptual) were 
assessed in third grade only, using the Raven's Coloured Progressive 
Matrices. This test, designed as a culture-fair, fluid intelligence test, 
was adopted for use in the evaluation as a measure of nonverbal 
reasoning and problem-solving ability in visual perceptual tasks. The 
Intellectual Achievement Responsibility Scale (lAR), used in third 
grade only, assessed the extent to which the child takes responsibility 
for his own successes or failures (i.e., internal locus of control) or 
attributes his achievements to the operation of external forces (e.g., 
luck or fate). Child behaviors were assessed through systematic 
obser\*ations recorded on the COI. Absences from school were deter- 
mined from school records. 



CONSISTENCY OF CLASSROOM PROCESSES 

We examine in this section the day-to-day variability of what 
occurs in the classroom. For the purposes of this chapter, we would 
like to have consistent descriptions of what w*is occurring in a 
classroom during spring 1973. The activities and interactions that 
occur in a classroom no doubt ( hange radically over the course of a 
school year as adults and pupils become acquainted, as the subject 
matter changes, and as holidav seasons pass. Facij when inrerrncrs 
are eonlinrd to the spring ol the school vcar, alter the Kollc vv Tlirouij^h 
t(\u her litis had approximately six months to iinplnufui .i spunsoi s 
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modeL classn)om processes no doubi vary from day to day. It was, 
iherelbre, imporlani to find out how stable our dcscripitons of those 
processes could he when based on only a few days of observ ation. 

Three consecutive days of observation were scheduled for each 
classroom, both Follow 'I'hrough and non-FoIlow Through. The 
values of the Classroom Check List (CCL) variables are based on all 
three days of observation; the values of the adult/activity focus 
Five-Minute Obsen*ation (FMO) interaction variables were based on 
two days of observation » while the child-focus obser\*ations were 
based on a single day per class. 

A subset of CCL ? *d FMO variables was chosen for the assess- 
ment of the stability of the classroom processes. The variables were 
selected on the basis of how well they described sponsors' programs. 
Results from previous evaluations (Stallings, 1973; Stallings, Baker, 
and Steinmetz, 1972) were used in the selection. 

For each variable, the correlations were computed between the 
observed values on the two days of adult/activity focus obscr\*ations. 
The Spearman-Brown formula was applied to the correlations to 
derive the consistency of two or three days of observation for the FMO 
and CCL variables, respectively, (Since there was only a single day of 
child-focus observations per class, the child-focus FMO variables 
were not included in this analysis,) 

The consistency coefficient reflects the variability of the obtained 
classroom means, part of which is a product of "true*' variance in 
classroom variables, while the remainder stems from measurement 
errors. Because of the method of determining observer reliability 
(measurement error), there is no satisfactory way to untangle the two 
by a correction for attenuation, 

A primary factor contributing to Icss-than-pcrfcct consistency is 
the assumed variability of the classroom processes from day to day. 
.Another factor is the variability of the children's absences from day to 
day and differences in the number of absences across classrooms, A 
high-consistency coefficient, say above 0.70, indicates that the class- 
rooms maintain approximately the same rank order on obscr\'cd 
scores from day to day. This would indicate that error due either to 
day-toHiay variability within classrooms or to absences is slight, 
although it would not rule out the possibility of systematic error 
operating across absences. 

For all classrooms combined, both sponsored and non -Follow* 
Through, the coefficients are reasonably high. Those for the CCL 
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variables are above 0.70, with the exception of variable 66 (numbers, 
mathematics, arithmetic) for the third grade, where the coeflficient 
was 0.68. For the adult/activity focus FMO variables, the coeflficienis 
were all above 0.85, with the exception of variable 374a (adult 
instruction, academic) for first grade, where the coefficient was 0.74. 

For the individual sponsors, approximately 84 percent of the 140 
coefficients had a value of 0.70 or more. The reliability coefficient for 
variable 66 (numbers, mathematics, arithmetic) was below 0.70 in six 
out of the fourteen cases. In particular, the coefficients were extremely 
low for both grade levels of the University of Arizona and for the third 
grades of Bank Street and of the University of Oregon. The negative 
coefficient for Bank Street's third grade is the result of one classroom 
in which an extremely high proportion of the class time was spent in 
mathematics on the first day and a small proportion on the second 
day. The extremely low-consistency coefficients for the University of 
Oregon on variable 66 in the third grade and variable 67 in the first 
grade are notable because this sponsor's program is considered more 
structured than others. 

In summar\', the coefficients computed over all classrooms indi- 
cate that the consistency of instructional processes was surprisingly 
high. The differences among classrooms account for a substantial 
portion of the vriHability among the variables we have selected. The 
same conclusion holds with a few exceptions for the coefficients 
computed for each sponsor and grade level The only variable for 
which the day-to-day consistency was low for several sponsors was 
variable 66 (average amount of time that a child was observed to be 
engaged in numbers, mathematics, arithmetic). 



MEASUREMENTS OF APPROXIMATION TO THEORETICAL 

MODEL 

The first step in the assessment of classroom implementation was 
to describe each educational model in detail. These descriptions were 
prepared by our staff and reviewed by the sponsors, then revised 
according to the sponsors' specifications. The second step was to 
create variables from the codes used in the observation instrument 
that would describe representative elements of each sponsor's model. 
Each sponsor identified those variables that were (1) important to his 
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niodcl and (2) expected to occur more frequently than in conventional 
classrooms. A list of variables was made for each of the seven models. 
The number of variables ranged from sixteen for the University of 
Oregon to twenty-eight for Far West Laboratory (see Table 5-1). The 
critical list of variables describes a sponsor's model only in part; the 
observation instrument employed in the study is not designed to 
capture the important subtle processes of some of the programs. For 
example, a goal of Far West Laboratory is to have teachers establish 
. nvironments where a child can search for solutions to problems in his 
own ^vay and can risk, guess, and make discoveries without serious 
negative psychological consequences. It was not possible for us to 
measure directly the extent to which such an environment had been 
established. 

Since the Follow Through programs are intended to be mnovative 
and to represent alternatives to the conventional classroom, a pool of 
non-Follow Through classrooms was used as the standard from which 
Follow Through classrooms were expected to differ in specified ways. 
The standards were established separately for first and third grades. 

With observational data, the distribution of scores rarely follows a 
normal curve; thus a nonparametric scaling technique was used in the 
implementation analysis. Implementation scores for each sponsor 
were determined by rank-ordering the non-Follow Through class- 
rooms' mean scores on each sponsor variable and then dividing the 
distribution into five equal parts, r quintiles. Each sponsor classroom 
has a score on each variable and falls within a quintile range. A 
sponsor's implementation score on any variable is alw-vs a score 
between 1 and 5. This represents the position of a Fc Through 
classroom score relative to the distribution of non-Fc v Through 
scores. 

Using each variable designated as critical by the sponsor of a 
model, a total implementation score was computed for each classroom 
in each project location and for each sponsor. In ordCr to assess the 
degree of implementation achieved by Follow Through classrooms, a 
total implementation score was also computed for each non-Follov.- 
Through classroom on each sponsor's set of implementation vari- 
ables. The mean and standard deviation of the non-Follow Through 
pooled classrooms arc reported for each sponsor separately for first 
and third grades. One-tailed "t" tests were computed to test for the 
significance of the differences between each Follow Through sponsor's 
classrooms and the non-Follow Through classrooms. Analyses of 
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variance won- also coinputcd to examine liie wiiliin-site and amuni;- 
sue difTerences in lulal iinplcnienlalion scores for each sponsor. 
Iniplcmcniation is jude^ed on two criteria: (1) Do the sponsored 
classrooms difler significantly from non-Foliow Through classrooms? 
and (2) Are the classrooms similar in implementation both within 
projects and among projccis? (Sec Stallings, 1975, p. 26, for this 
statistical procedure.) 



RESULTS 

l^he data obtained from this large sample indicated that the 
models in Head Start and Follow Through Planned Variation pro- 
iirams were ver\' effective in training teachers in diverse locations to 
instruct in compliance w^ith the models (i.e., Bank Street teachers in 
l^uskeegee looked similar to Bank Street teachers in New York City). 

Further analyses of the obser\-ation data indicated that instruc- 
tional processes identified with exploratory' models explained 45 
percent of the variance in scores on the Ravens Progressive Matrices. 
Instructional variables identified with direct instruction models ex- 
plained 37 percent of the variance in reading achievement and 64 
percent in mathematics achievement (Stallings, 1975). This was one 
of the first national evaluations of educational models to use a 
comprehensive obser\-ation system linking classroom processes to 
student outcomes. 



OTHER STUDIES USING THE COI 

Early Childhood Education 

Following the initial study, the COI was used in a California 
study of an early childhood education program (Stallings, Cor\% 
Fairvv'eather, and Needels, 1979), The evaluation focused on the 
instructional processes of teachers in schools classified as having 
students with increasing achievement scores, compared with the 
instructional processes .of teachers in schools where students' achieve- 
ment scores were decreasing. This evaluation indicated more variance 
in instructional processes within schools than among schools. Overall 
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observation variables identified with direct instruction methods wci-c 
significantly related to higher student achievement scores regardless 
of how the school had been classified. 

Puerto Rican— English as a Second Language 

The versatility of the coding system allows each coded interaction 
to be identified as English or Non-English. This capability was used 
in an evaluation of the quantity and quality of English being spoken in 
Puerto Rican classrooms. Puerto Rican observers were trained in a 
ten-day session to collect data reliably on the COL Data were 
collected in urban and rural elementary and secondary schools, and 
observations occurred over two full days in randomly selected class- 
rooms. The use of English was calculated as a percentage of the total 
recorded interactions. The quality of English used was assessed by 
reviewing tape recordings. (See Rivera-Medina, 1981.) 



MODIFICATION FOR SECONDARY SCHOOL 



For a study of teaching basic reading skills in secondary schools 
(Stallings, Fairweather, and Needels, 1978), the COI was modified to 
record the activities and instructional processes occurring in secon- 
dary classrooms. This study identified forty-one observation variables 
that were significantly related to a gain in reading achievement scores. 
Modifications in the coding and in the training program have been 
made to accommodate other subject areas such as science, social 
studies, mathematics, and physical education. For example, in a 
study of factors influencing women to take advanced matfiematics 
classes (Stallings and Robertson, 1979), the COI was modified to 
identify when teachers were speaking to male or female students. The 
coding system provided variables that could be used to compare the 
nature of the interactions between male students and teachers with 
those between female students and teachers. Counter to our predic- 
tion, we found no significant differences in the classroom interactions 
among teachers and their male and female students. Because modifi- 
cations to the program have occurred over the past ten years, we 
changed the name of the observation system to Stallings Observation 
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Table 5-2 

Self-Analytic Model of Staff Development 

Bawl mtf Pule St 
ObbtTVC irachrrs 

Prepare individual profiles of behavior 
Teachers assess what changr is needed 
Teachers set goals 

Start where teachers are in skill developmeiil 
Inform 

Provide ir»formation about research findings on effective practice 

Link theory-, research, and practice 

Check for understanding by eliciting practical examples 

Ask. Why might that be? How does that work in your classroom? 
iiuided Practice: Integration 

Provide conceptual units one at a time 

Teachers adapt to own context and style 

Teachers assess and provide feedback via peer observations 

Teachers make a commitment to try a new idea in class the next day 
Post-Test Observations 

Observe teachers: prepare second profile 

Teachers analyze profiles 

Teachers set new goals 

Assess training program for effectiveness 



STAFF DEVELOPMENT BASED ON OBSERVATION 

Phase I of the study of secondary- basic reading skills was a 
year-long quasi-experiment in which very specific instructional vari- 
ables were identified. Using these variables we constructed a staff- 
development program (named the Effective Use of Time [EUOT]). 
This training program, which was Phase II of the study and was 
based on an interactive theory of adult education, guided teachers to 
use the effective strategies. (Sec Table 5-2 for the model.) 

In the year-long experiment, the teachers in the cxperim ^ital 
classes successfully implemented the EUOT program, and their 
students gained six months more on reading achievement tests than 
did students in control classrooms. Findings from Phase I and II 
correlations and analysis of variance were remarkably similar. Sum- 
marizing the two data sets, we established the criteria shown in one 
teacher's profile (see Figure 5-2). These criteria then formed the basis 
for our recommendations for change in the teacher's behavior. The 



126 



Classroom Observations of Teaching 



Figure 5-2 

The Percentage of Time Devoted by a Teacher to Certain Classroom 
Activities* in Relation to EsUbllshed Criteria* with Recommendations for 

Change. 
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criteria are adjusted according to the achievement level of the stu- 
dents (Stallings, 1986). 



DISSEMINATION 

Federal and state education agencies, concerned for the manv 
iitudcnts in sjccondary schools who could not competently read, write* 
or compute, found the findings from the secondary reading studies of 
considerable interest. Subsequently, under the auspices of the Stall- 
ings IVarhini^ and Lrarnini^ Institute, the EUOT program was 
disseminated through the National Diffusion Network, and funding to 
assist in dissemination has continued from 1980 throui^h 1990. The 
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WOT pniRram has lu-rn widely disseminated. Trainers lor the 
LrOT program werr certified at the Stalhngs Teaching and Learn- 
ing Institute, at \ anderbih University and the University of Hous- 
ton -I-hese participants included college of education faculty, school 
stairdcvclopers. and state department of education personnel. A body 
of research has evolved from these participants and from numerous 
student dissertations. The development of lap-top computer technol- 
ogy- has allowed for more in-depth analysis and immediate feedback to 
teachers bv providing instant profiles. 

EUOT has been implemented in Branson, Missouri, over a 
ihree-vear period. There, selected teachers were certified as trainers 
and obse^^•ers to disseminate the program to all teachers in the 
district. Significant behavior change has been recorded for teachers 
and students throughout the project. In fall 1986 the Missouri 
Department of Elementary and Secondary- Education identified Bran- 
son as a "Successful Project." Governor Ashcroft stated, In the 
Branson school district, teachers and administrators have reported 
significant success with their Effective Use of Time (EUOT) program. 
This in-service cxnrricnce helps teachers see how well they use class 
time and gives them strategies for using class time '"°'-^^!f^"^''^>'. 
The teachers involved in the program reported that EU01 helped 
them improve their skills significantly" (Orth, 1987, p. 4). 



EUOT RESEARCH 

Anderson (1984) examined the use of the SOS variables combined 
with Effective Use of Time (EUOT) training to improve instruction in 
the Washington, D.C., public schools. Her study focused on the 
changes in ^hc teaching behavior of twenty-nine junior high school 
teachers who were trained by five different EUOT trainers. The SOS 
W.S used to determine the degree of change from the beginning to the 
end of the semester. The study examined the difference between the 
change of groups taught by four district trainers and one taught by an 
external trainer consultant. The teachers in the external consultant s 
group were found to change their behavior more than did teachers in 
the other groups. Anderson found that the most change occurred 
when the trainer (1) provided frequent teacher interactions. (-) 
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discussed the observation variables frequently, (3) made frequent 
supportive statements to tr>' new ideas, and (4) stayed focused on the 
topic of the seminar (i.e., managing and motivation, student behavior, 
asking higher-level questions). 

Longitudinal Study 

Devlin-Scherer, Schaffer, and Stringfield (in press) conducted a 
follow-up observation study of an Effective Use of Time Program, for 
which they selected a sample of ten teachers who reflected high and 
low implementors from the original EUOT observations. They had 
three points of observation data (before the training, after the train- 
ing, and two years later). High implementors had scores above the 
mean at the end of the training and the low implementors had post 
scores below the group mean. The ten teachers were observed and 
interviewed two years after receiving training in order to determine 
the long-term impact of the training on their teaching. The follow-up 
observations ./ere compared with the initial observations on thirty- 
two variables. The average for the group of ten remained about the 
same over the elapsed time (i.e., change on eighteen of the observed 
variables was maintained at the same level as at the end of the 
training). On eight of the variables, the group's average was reverting 
to their initial behavior. Analyses of individual teacher's profiles 
revealed that teachers who had initially implemented the variables 
successfully were more likely to sustain their change than were 
teachers who implemented at a lower level. The high-implementing 
teachers indicated in interviews that the workshops provided them 
with present and future assistance. The low-implementing teachers 
indicated that sessions were confirmations of what they knew. They 
enjoyed the workshops more as opportunities to interact with peers. 
High implementors were able to identify specific skills they used in 
their classrooms. Low implementors were more global in their re- 
sponses and less likely to identify specific skills. 

Teacher Commitment 

A study by Devlin-Scherer and colleagues (1985) entitled 'The 
Effects of Developing Teacher Commitment to Behavioral Change'' 
responds to the concern of measuring the effectiveness of training 
programs. 
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Vvrn.nen rlcmmian and sr. ondan irachcrs w<-r. Ira.nrd hy |>a.rs ,.l umv.r- 
M,v, pnncpal. or .eachcr .ra.n^r. i„ .ho S.ailins^ Efirc.ivc ..I 1 ,m. 
Hrouram. Wurkshc.p ^..su.ns w.-r.- audin-rr.ordcd and analyzed to dnrrm.nr 
Uu- inmaU of vrrbal eommi.men, behavior on changes in classreH.m icachnig 
hehavi<,r, I'sin^ ih,- SOS, a comparison of prr- and pos.-classroon. (^M•r^ a- 
,i„„s indicated that teachers «ho stated public commitments to behav.ora 
changes each week more ohen followed through and made these behav.oral 
chanae. in their classroom teachmi; than d>d teachers who did not make such 
piiblii. commitment. [P. 31) 



SOS FOR EVALUATION 



Stringfield, Teddlie. and Suarez (1985) used the SOS to examine 
the classroom instructional processes of two Louisiana schools. One 
was identified as high achieving and the other was identified as low 
achieving The majoritx- of the students in both schools are white, and 
black, Hispanic, and Asian students form the minority population. 
Each school is located in middle-class, single-family-dwelling neigh- 
borhoods. The site team observers (who were blind to the achieve- 
ment status of the schools) noted that students at the low-achieving 
school soent about one hour less a day doing academic tasks^During 
six days of observations, few classes began at 8:30 A.M as scheduled. 
Manv students were in the halls when the bell rang. The researchers 
indicated that there was "a constant stream of children to and from 
bathrooms, the office, the library, and in some cases, just hanging out 
in the halls" (p. 34). According to the SOS data, students in the 
high-achieving school received nearly twice as much interactive in- 
struction as did students in the low-achieving school. 



STUDENT TEACHING 



A study by Harris (1988) included a sample of fifty student 
teachers. Over a fifteen-month period, twenty student teachers part, 
cipated in full treatment of SOS feedback plus EUOT workshops and 
seven in feedback from SOS treatment only; there were twenty-three 
controls. Change was measured with eleven variables a^reeatine 
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subjects improved (moved toward criterion levels) for eight of eleven 
variables, with change significant at the O.Oo level for teachers 
monitoring students, students in interactive instruction, and students 
off task SOS fecdback-only subjects improved for nine variables, with 
change significant for teachers' interactively instructing, teachers 
managing, and students in interactive instruction. Implications for 
teacher education suggest that the feedback ponion of the ELOl 
pro-am is effective during the prcservicc teaching expenence, that the 
Dortion of the EUOT workshops dealing with interacuve instruction 
effects a change beyond that of SOS feedback only, and that trainer as 
obsen-er increases student teachers' classroom management 

Freiberg and Waxman (1988) used three approaches for prov-id- 
in< eedback to student teachers that have not been widely used but 
have great potential for improving the classroom instruction oi^preser- 
vice teachers. The methods include (a) feedback from pupUs, (b, 
systematic feedback from classroom observation system (SOb) and 
(c) self-analvsis of classroom lessons through an audiotape analysis 
(Low Inference Self-Assessment System: see Freiberg 198/). The 
authors found that these feedback approaches, individuallv or collec- 
tively provide student teachers, cooperating teachers, and university 
supen-isors with excellent data for strengthening P;^;^^'^^^ 
teaching experience. (See Freiberg, Waxman, Houston, 1987.) 

Student Teaching in Inner-City Schools 

The purpose of the Learning to Teach in Inner-City Schools 
project (LTICS) is to develop teachers who choose to teach in 
inner-citv schools and are effective in teaching inner-c.tv children. 
Historicallv, most new teachers did their student teaching m the 
suburbs. Those hired for inner-city schools had little Preparation to 
serve children who come from a wide variety of cultural backgrounds 
and from low socioeconomic families. The dropout rate o. new- teach- 
ers in inncr-citv schools is reported to be twice the average for the 
nation (Stallin^^s, Martin, and Bossung in press). 

The eoai of LTICS is to change this h.storv of failure bv noMcc 
teachers' in inner-n,v schools to one of success. To this end, a 
partnership was esiablishcci between a school district serv.n, inncr- 
ci.v s,ud.-n.s and a <„llc,c of education that trams student eadi.-r ■ 
Tlu- partm-rsh.p cn alcd a prolessional development school ihal pro- 
vides a siruclurc ... wl.ict, a i;roup oi sup<-rvisin« tea. hers, a.lleuc 



ERIC 



Observation for the Improvement of Teaching 



131 



supervisors, and icn to twelve student teachers per semester learn to 
implement effective instruclifjnal strategics for inner-city school pop- 
ulations. This occurs through shared required weekly seminars that 
follow the EUOT format. The seminars focus on the problems and 
solutions of teaching inner-cit\- children (e.g., holding high expectations, 
working with parents and their children, assessing children's prior 
knowledge and experiences, planning appropriate lessons, managing 
classroom lime, motivating and managing positive student behavior, 
and developing reflectivity and thinking skills). Seminars are taught 
by school and college faculty and community/parent representatives. 

Student teachers and supervising teachers are obscrv'cd with the 
SOS at the beginning of each semester and set goals for instructional 
change. The percentage of time children spend on academic tasks is 
computed and analyzed for change; these statistical analyses have 
indicated significant change each semester in student teacher and 
teacher behavior. The impact of LTICS is also evaluated by calculat- 
ing the percentage of student teachers graduating from LTICS who 
choose to teach inner-city or other at-risk populations (85 percent at 
this time). Follow-up intemews with LTICS graduates indicate job 
satisfaction, and their principals give them high ratings. 



SUMMARY 

Observation in classrooms serves many purposes. Most often 
observation is used to evaluate teachers and students. The flexibility 
of the SOS has provided a means to identify effective instructional 
practices in a wide range of classroom settings. The specificity of the 
SOS variables and their face validity have made it relatively easy to 
translate them into teaching behaviors, and these data from the study 
have provided the content for extensive in-scn-ice and prcsen'ice 
professional development. The profiles of teaching behaviors obsen-ed 
in a pretest and posttest design ^^rovide a continuing basis for evalua- 
tion and improvement of the EUOT program. 
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